Optimal data collection for informative rankings expose well-connected graphs
نویسندگان
چکیده
Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating data set and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.
منابع مشابه
Optimal Data Collection for Improved Rankings Expose Well-Connected Graphs
Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking pro...
متن کاملComparing rankings by means of competitivity graphs: structural properties and computation
In this paper we introduce a new technique to analyze families of rankings focused on the study of structural properties of a new type of graphs. Given a finite number of elements and a family of rankings of those elements, we say that two elements compete when they exchange their relative positions in at least two rankings. This allows us to define an undirected graph by connecting elements th...
متن کاملProjection pursuit for discrete data
Abstract: This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones d...
متن کاملDistinct edge geodetic decomposition in graphs
Let G=(V,E) be a simple connected graph of order p and size q. A decomposition of a graph G is a collection π of edge-disjoint subgraphs G_1,G_2,…,G_n of G such that every edge of G belongs to exactly one G_i,(1≤i ≤n). The decomposition 〖π={G〗_1,G_2,…,G_n} of a connected graph G is said to be a distinct edge geodetic decomposition if g_1 (G_i )≠g_1 (G_j ),(1≤i≠j≤n). The maximum cardinality of π...
متن کاملOn the edge-connectivity of C_4-free graphs
Let $G$ be a connected graph of order $n$ and minimum degree $delta(G)$.The edge-connectivity $lambda(G)$ of $G$ is the minimum numberof edges whose removal renders $G$ disconnected. It is well-known that$lambda(G) leq delta(G)$,and if $lambda(G)=delta(G)$, then$G$ is said to be maximally edge-connected. A classical resultby Chartrand gives the sufficient condition $delta(G) geq frac{n-1}{2}$fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 15 شماره
صفحات -
تاریخ انتشار 2014